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Abstract 


/ 

v 

Tensor  methods  are  a  class  of  general  purpose  methods  for  solving  systems  of  nonlinear 
equations.  They  are  especially  intended  to  efficiently  solve  problems  where  the  Jacobian 
matrix  at  the  solution  is  singular  or  ill-conditioned,  while  remaining  at  least  as  efficient  as 
standard  methods  on  nonsingular  problems.  Their  distinguishing  feature  is  that  they  base  each 
iteration  on  a  quadratic  model  of  the  nonlinear  function.  The  model  has  a  simple  second  order 
term  that  allows  it  to  interpolate  more  information  about  the  nonlinear  function  than  stan¬ 
dard,  linear  model  based  methods,  without  significantly  increasing  the  cost  of  forming,  storing, 
or  solving  the  model. 

This  paper  summarizes  two  types  of  tensor  methods,  derivative  tensor  methods  that  cal¬ 
culate  an  analytic  or  finite  difference  Jacobian  at  each  iteration,  and  secant  tensor  methods 
that  avoid  Jacobian  evaluations.  Both  are  shown  to  require  no  more  function  or  derivative 
information  per  iteration,  and  hardly  more  storage  or  arithmetic  operations  per  iteration,  than 
standard  linear  model  based  methods.  Computational  results  are  presented  that  indicate  that 
both  tensor  methods  are  consistently  at  least  as  reliable  as  the  corresponding  linear  model 
based  methods,  and  are  significantly  more  efficient,  both  on  nonsingular  and  on  singular  test 
problems.  v  '  ' 


1.  Introduction 


This  paper  summarizes  a  recently  developed  class  of  methods,  called  tensor  methods,  for 
solving  the  nonlinear  equations  problem 

given  F  : /?"—*/?*,  find  x^Gfi"  such  that  F(x„)  =  0  (1.1) 

where  it  is  assumed  that  F(x)  is  at  least  once  continuously  differentiable.  Tensor  methods  are 
especially  intended  to  efficiently  solve  problems  where  the  Jacobian  matrix  of  F  at  x^, 
F  (x*)  c  R  ",  is  singular  or  ill-conditioned.  They  also  are  intended  to  be  at  least  as  efficient 

I 

as  standard  methods  on  problems  where  F  (xj  is  nonsingular.  Their  distinguishing  feature  is 
that  they  base  each  iteration  on  a  quadratic  model  of  F(x)  whose  second  order  term  has  a  sim¬ 
ple  form. 

Systems  of  nonlinear  equations  arise  frequently  in  many  practical  applications  including 
equilibrium  calculations,  curve  tracing  problems,  and  as  subproblems  in  solving  nonlinear  sys¬ 
tems  of  differential  equations.  In  many  important  situations,  F  (x^)  is  singular  or  ill- 
conditioned.  For  example,  in  some  stiff  systems  of  ordinary  differential  equations  the  Jacobian 
of  the  associated  system  of  nonlinear  equations  is  nearly  singular  for  all  x .  The  calculation  of 
turning  points  in  curve  tracing  problems  and  the  solution  of  over-parameterized  data  fitting 
problems  are  other  common  situations  that  lead  to  singular  systems  of  equations.  In  all  these 
cases,  it  is  important  to  notice  that  the  (near)  rank  deficiency  in  the  derivative  matrix  usually 
is  small.  This  is  the  case  in  which  our  methods  are  intended  to  improve  upon  standard 
methods. 

Standard  methods  for  solving  (l.l)  base  each  iteration  upon  a  linear  model  ,V/(x)  of  F(x) 
around  the  current  iterate  z/R" . 
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M(X'+d)  =  F(xc)  +  Jcd  (1.2) 

where  d(zR* ,  Jc€.Rn*n .  These  methods  can  be  divided  into  two  classes:  derivative  methods, 
where  Je  is  the  current  Jacobian  matrix  F  (ic)  or  a  finite  difference  approximation  to  it,  and 
secant  methods,  where  Jc  is  a  secant  (quasi-Newton)  approximation  to  the  Jacobian.  For  a 
general  description  of  these  methods,  see  e.g.  Dennis  and  Schnabel  [1983  . 

When  the  analytic  Jacobian  is  available,  the  linear  model  (1.2)  becomes 

Xf(xc+d)  =  F{xe)  +  F  (X')d.  (1.3) 

The  standard  method  for  nonlinear  equations,  Newton’s  method,  consists  of  setting  the  next 
iterate  x  +  to  the  root  of  (1.3), 

F  (*cV1f(*')-  (!••*) 

If  F  (zc)  is  Lipschitz  continuous  in  a  neighborhood  containing  the  root  xM  and  F  (ij  is  non- 
singular,  then  the  sequence  of  iterates  produced  by  (1.4)  converges  locally  and  g-quadratically 
to  x^.  This  means  that  there  exist  <‘‘>0  and  c  >0  such  that  the  sequence  of  iterates  )  xk  )  pro¬ 
duced  by  Newton's  method  obeys 


if  llx0  —  x^ll  <  <\  In  practice,  local  g-quadratic  convergence  means  eventual  fast  convergence. 

Newton’s  method  usually  is  not  quickly  locally  convergent,  however,  if  F  (x^)  is  singular. 
For  example  when  applied  to  one  equation  in  one  unknown  (n  - 1)  where  /(x<()  =  0  but 
/  (•r,)’1'0-  Newton's  method  is  locally  ^-linearly  convergent  with  constant  converging  to  1 
meaning  that  the  sequence  of  iterates  J  xk  )  obeys 

!r*. i  *.!  -  lim  ‘  ' 

k  — ♦  • 

if  Jx0  ~  x*\  19  sufficiently  small.  For  systems  of  equations,  the  situation  is  more  complex  and 
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has  been  analyzed  by  many  authors,  including  Decker  and  Kelley  1980a,  1980b,  1982  ,  Decker, 
Keller,  and  Kelley  1983  ,  Griewank  [1980a,  1980b,  1 985 1 ,  Griewank  and  Osborne  [1981,1983  , 
Keller  19701 ,  Kelley  and  Suresh  ! 1 983],  Rail  1966],  and  Reddien  '1978,  1 980i .  In  summary, 
their  papers  show  that  from  many  starting  points,  Newton’s  method  for  systems  of  equations 
also  is  locally  (/-linearly  convergent  with  constant  converging  to  h>,  although  for  some  problems 
with  starting  points  arbitrarily  close  to  xm,  (1.4)  may  be  undefined  or  lead  further  away  from 
the  solution  (see  e.g.  Griewank  and  Osborne  1983').  In  practice,  Newton’s  method  usually 
exhibits  local  linear  convergence  with  constant  N  on  singular  problems,  much  slower  conver¬ 
gence  than  one  would  like. 

When  analytic  derivatives  are  unavailable  and  function  evaluation  is  expensive,  (1.1)  gen¬ 
erally  is  solved  by  a  secant  method.  These  methods  attempt,  as  much  as  possible,  to  solve  (1.1) 
using  only  the  function  values  at  the  iterates.  The  ..  '',"1  '!  ‘21  ;,il!  N  t-ed  but  the  matrix  ./.  is 
generated  from  these  function  values  and  may  be  a  very  rough  approximation  to  F  (rc).  In 
the  most  commonly  used  secant  method  for  systems  of  equations,  Broyden’s  method,  the  Jaco¬ 
bian  approximation  J.  is  chosen  to  be  the  smallest  change  to  the  previous  Jacobian  approxima¬ 
tion  which  causes  the  new  linear  model  M(x)  to  interpolate  the  value  of  F(x)  at  the  previous 
iterate.  This  results  in  a  rank  one  change  to  the  Jacobian  approximation  at  each  iteration. 
(The  details  are  given  in  Section  3.1.)  The  initial  Jacobian  approximation  is  made  by  finite 
differences,  and  sometimes  it  is  necessary  to  reset  J  to  a  finite  difference  approximation  at 
subsequent  iterations. 

The  sequence  of  iterates  produced  by  Broyden's  method  converges  locally  and  q- 
superlinearly  to  r(  as  long  as  F  (rr)  is  Lipschitz  continuous  in  a  neighborhood  containing  the 
root  and  F  (r€)  is  nonsingular  (Broyden,  Dennis,  and  More'  1973').  This  means  that  there 
exist  ■'  0  and  r  0  such  that  the  sequence  of  iterates  \ik  \  obeys 


wirwi 
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lim  llx^  +1  —  z^l!  /  llzt  —  z^ll  =  0 
*  —  > 

I 

if  llzQ  z^ll  <  A  and  IIJ0  —  F(z0)ll  <  r.  In  practice,  secant  methods  still  are  quickly  conver¬ 
gent  on  nonsingular  problems,  and  while  they  usually  require  more  iterations  than  Newton’s 
method,  they  usually  require  fewer  function  evaluations  than  a  finite  difference  implementation 
of  Newton’s  method. 

However  secant  methods,  like  Newton’s  method,  are  slowly  convergent  on  problems  where 

1  i  ii 

F  ( x *)  is  singular.  For  example  on  one  variable  problems  with  /  (x  )=0  but  /  (z  )?t0,  the 
secant  method  is  locally  ^-linearly  convergent  with  constant  converging  to  0.618,  a  slightly 
slower  rate  than  Newton’s  method.  For  multiple  variable  problems  with  rank(F  (x^))  --  n  -1, 
Decker  and  Kelley  I1985j  have  shown  that  this  same  rate  of  convergence  is  obtained  by 
Broyden’s  method  from  certain  starting  points.  As  in  the  case  of  Newton’s  method,  this  slow 
linear  convergence  usually  is  observed  in  practice,  making  quicker  methods  desirable. 

Several  papers,  for  example  Decker  and  Kelley  (1982},  Decker,  Keller,  and  Kelley  1983  , 
Griewank  [1980a,  1985],  Kelley  [1985],  and  Kelley  and  Suresh  [1983],  propose  methods  that  are 
rapidly  convergent  on  some  singular  problems.  Many  of  these  methods  are  related  to  the  one 
dimensional  acceleration  technique  of  taking  j  times  the  Newton  step  if  one  has  a  root  of  mul¬ 
tiplicity  j.  Some  other  methods  explicitly  calculate  and  use  higher  derivative  information  in 
null  space  directions.  To  our  knowledge,  no  computational  experience  with  a  complete  method 
of  this  type  has  been  published,  and  it  is  not  clear  how  amenable  these  techniques  are  to  solv¬ 
ing  general  systems  of  nonlinear  equations  where  it  is  unknown  a  priori  whether  F  (z^)  is 
singular  or  not. 

The  major  aim  of  tensor  methods  is  to  provide  general  purpose  methods  that  have  rapid 
convergence  even  when  F  (z^)  is  singular.  In  addition,  the  methods  should  not  experience  any 
special  difficulty  when  Jc  is  singular  or  ill-conditioned,  while  methods  based  on  (1.2)  must  be 
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modified  in  this  case. 

Tensor  methods  are  based  on  expanding  the  linear  mode!  (1.2)  of  F(x)  around  xc  to  the 
quadratic  model 

A/T(xe-M/)  =  F(xe)  -  Jcd  +  W'dd  (1.5) 

where  TceRn<'Xn  and  J.  is  F  (xf)  or  a  secant  approximation  to  it.  The  three  dimensional 

object  Tc  often  is  referred  to  as  a  tensor,  hence  we  call  (1.5)  a  tensor  model,  and  methods 

based  upon  (1.5)  tensor  methods.  The  term  Tcdd  is  defined  by  (Tcdd)  t]  =  d  Htd,  where  Ht  is 
the  i‘h  horizontal  face  of  Tc-  Thus  the  model  MT(xc  -  d )  is  the  n-vector  of  quadratic  models  of 
the  component  functions  of  /■'(x), 

(.Vfr(xf  ~d))  «.  =  /,  -  gjd  -  1  id T  H%d  ,  «  =  l,  ■  •  •  ,r» 

.  T  1 

where  fx  =  /*"( xc )) i i,  gt  -  row  i  of  F  (x^)  or  an  approximation  to  it,  and  //,  is  approxima¬ 

tion  to)  the  Hessian  matrix  of  the  «**  component  function  of  F(x). 

The  obvious  choice  of  Tc  in  (1.5)  is  the  matrix  F  (xc)  of  second  partial  derivatives  of  F 

t 

at  ic ;  if  Jc  is  F  (xf),  this  makes  (1.5)  the  first  three  terms  of  the  Taylor  series  expansion  of  F 

•  i 

around  x^.  Several  serious  disadvantages,  however,  make  (1.5)  with  Tc  F  (xc)  unacceptable 

3 

for  algorithmic  use.  First,  the  n  second  partial  derivatives  of  F  at  xc  would  have  to  be  com- 

puted  at  each  iteration.  Second,  the  model  would  take  more  than  n  /2  locations  to  store  as 
2 

compared  to  the  n  locations  for  the  standard  model.  Third,  to  find  a  root  of  the  model,  at 
each  iteration  one  would  have  to  solve  a  system  of  n  quadratic  equations  in  n  unknowns, 
which  for  n  •  1  requires  an  iterative  procedure.  Finally,  the  model  might  not  have  a  real  root. 

To  use  a  model  of  form  (1.5)  and  avoid  these  disadvantages,  our  tensor  methods  use  a 
very  restricted  form  of  T  .  In  particular,  our  tensor  methods  require  no  additional  derivative 
or  function  information;  the  additional  costs  of  forming  and  solving  the  tensor  model  are  small 
Cv>mpared  to  the  0(n3)  arithmetic  cost  per  iteration  of  standard  methods;  and  the  additional 
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storage  required  for  our  tensor  models  is  small  compared  to  the  n  storage  required  for  the 
Jacobian.  The  remainder  of  this  paper  describes  how  we  utilize  the  tensor  term  T  in  the 
model  (1.5)  and  what  benefits  we  obtain  from  its  inclusion.  In  Section  2  we  summarize  the  use 
of  a  tensor  model  in  derivative  methods  for  nonlinear  equations,  and  our  computational  experi¬ 
ence  with  this  method.  Section  3  similarly  presents  the  use  of  and  computational  experience 
with  a  tensor  model  in  secant  methods  for  nonlinear  equations.  In  Section  -4  we  briefly  com¬ 
ment  on  extensions  of  tensor  methods  to  nonlinear  least  squares  and  to  unconstrained  optimi¬ 
zation.  More  details  on  this  research  can  be  found  in  Frank  1984  ,  Schnabel  and  Frank  1984  , 
and  an  upcoming  paper  by  Frank  and  Schnabel. 

Notice  that  we  are  denoting  members  of  a  sequence  of  n-vectors  i  by  \  z,  \  where  each 
ikirRn ,  and  components  of  a  vector  #t/i"  by  v  i  \:R . 


2.  Derivative  tensor  methods 

Derivative  tensor  methods  base  an  algorithm  for  solving  systems  of  nonlinear  equations 
on  a  model  of  the  form 

.V/T(xc  -d)  --  F(rJ  *  F  U.Jd  -  Tcdd ,  12.1) 

where  it  is  assumed  that  F  ( )  either  is  supplied  analytically  or  is  calculated  by  finite 
differences.  Their  aim  is  to  choose  TcFR"  "  "  so  i  iial  the  model  (2.1)  is  hardly  more  expen¬ 
sive  to  form,  store,  or  solve  than  the  standard  model  ( 1.3).  while  still  leading  to  an  algorithm 
that  requires  fewer  function  evaluations  than  standard  methods  to  solve  difficult  problems. 
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2.1  Forming  the  tensor  model 

The  first  step  in  deriving  a  method  based  on  (2.1)  is  to  choose  the  second  order  term  T  . 
We  do  not  use  any  second  derivative  information  in  constructing  T r .  Instead,  we  construct  the 
second  order  term  in  (2.1)  by  asking  the  model  to  interpolate  additional  values  of  the  function 
F  (r)  that  have  already  been  computed  by  the  algorithm.  In  particular,  we  ask  the  model  to 
satisfy 

=  ^(*c)  -  F  (xc)sk  r-  hTcsksk,  kr=\,  -  -.p  (2.2a) 

where 


\  *-k  -  zc'  ’  '  '  -P  • 

and  z  ,,  ■  ■  ■  ,x  are  some  set  of  p  not  necessarily  consecutive  past  iterates. 


For  the  equations  (2.2)  to  be  consistent,  the  past  points  (x_t|  must  be  selected  so  that 
set  of  directions  |  sk j  is  linearly  independent.  In  fact,  we  enforce  a  far  more  restrictive  condi¬ 
tion.  We  always  set  x_j  to  the  most  recent  iterate.  We  then  include  each  remaining  past 
iterate  in  the  set  of  points  to  be  interpolated  if  the  step  from  it  to  ic  makes  an  angle  of  at 
least  H  degrees  with  the  subspace  spanned  by  the  steps  to  the  already  selected  more  recent 
iterates.  Here  H  is  some  fixed  angle  between  20  and  45  degrees.  In  addition,  we  consider  at 
most  v  n  pa9t  iterates.  The  bounds  ^  n  and  20-45  degrees  have  been  shown  by  computational 
experience  to  be  reasonable.  This  procedure  for  selecting  past  iterates  to  interpolate  is  implo- 
mented  easily  using  a  modified  Gram-Schmidt  algorithm,  and  requires  about  n '  multiplications 
and  additions. 


The  equations  (2.2)  are  a  set  of  np  n'  5  linear  equations  in  the  n  *  unknowns  comprising 
T  e  •  Thus  T  _  is  underdetermined,  so  we  follow  the  standard  and  successful  practice  in  secant 
methods  for  nonlinear  equations  and  optimization  (see  e.g.  Dennis  and  Schnabel  1979)  and 
choose  T  to  be  the  solution  to 


(2.3) 


minimize  IITJIj, 

TC£R'“’* 

subject  to  Tesksk  =  tk  ~  2  (  F(z  _k)  -  F(zc)  -  F  (xe)sk  )  .  k  1,  ■  •  •  ,p 

where  IMIf  is  the  Frobenius  norm.  If  we  denote  by  uvw  the  rank  one  tensor  whose  t'(*  horizon- 
tal  face  is  the  rank  one  matrix  u  t  (tu;  ),  then  the  solution  to  (2.3)  is  shown  by  Schnabel  and 
Frank  I984j  to  be 

n 

TC  =  ^  **  V  (2,1) 

*- 1 

where  (a(  i  ,  •  •  •  ,af.i)T  —  M  1  *  (<t  •  •  •  ,/  .  i  )T ,  t=l,  •••,«,  with  \f-Rv  <r  the  positive 

T  2 

definite  matrix  defined  by  M  ij]  —  (st  s} )  ,  l<tj<p. 

Substituting  (2.4)  into  the  tensor  model  (2.1)  gives 

v 

\fT(xc-d)  =  F( xe)  +  F  {xc)d  +  '4  V  ak  (dTsk)'.  (2.5) 

*=t 

The  simple  form  of  the  second  order  term  in  (2.5)  is  the  key  to  being  able  to  efficiently  form, 
store,  and  solve  the  tensor  model.  Since  p  ^  n,  the  additional  storage  for  the  entire  method 
is  at  most  4^  n  n-vectors,  compared  to  the  n2  storage  for  F  (jf).  The  cost  of  forming  the  ten- 
sor  model  by  the  above  procedure  is  at  most  n’  multiplications  and  additions  per  iteration, 
small  compared  to  the  at  least  n  / 3  multiplications  and  additions  per  iteration  required  by 
standard  methods. 
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2.2  Solving  the  derivative  tensor  model 

To  base  an  efficient  algorithm  on  the  tensor  model  (2.5),  we  need  to  efficiently  find  a  root 
of  this  model,  that  is  a  d'~Rn  for  which 

p 

Mt(xc  t  d)  =  F(xc)  t-  F  (xc)d  -  V  ak  [d?  skf  -  0.  (2.6) 

*  =  i 

In  some  cases,  the  tensor  model  may  have  no  root;  it  is  then  appropriate  to  choose  d  to  minim¬ 
ize  the  tensor  model  in  some  norm.  We  choose  the  l2  norm,  so  that  the  general  problem  we 
wish  to  solve  is  to  choose  d^Rn  to  minimize  WMT(xc  —  d)ll2. 

The  basic  idea  behind  efficiently  solving  (2.6)  is  that  since  MT  only  is  quadratic  on  the  p 
dimensional  subspace  spanned  by  Js^J,  and  is  linear  on  the  orthogonal  complement  to  this  sub¬ 
space,  it  may  be  possible  to  solve  (2.6)  by  solving  a  svstem  of  p  quadratic  equations  in  p  unk¬ 
nowns  plus  a  system  of  n—  p  linear  equations  in  n—p  unknowns.  This  is  accomplished  by  a 
procedure  given  in  Schnabel  and  Frank  [1984].  This  procedure  first  makes  an  orthogonal 

T  .  .  . 

transformation  of  the  variable  space  to  d  =  Q  d,  so  that  all  n  equations  are  quadratic  only  in 

the  last  p  components  of  d ,  d2(zRp ,  and  are  linear  in  the  first  n—p  components,  d^R*  P  ■  It 
then  makes  an  orthogonal  transformation  of  the  equations  that  eliminates  the  linear  variables 

d,  from  the  final  p  (actually  <j>p,  see  below)  equations  and  makes  the  preceding  equations  tri¬ 
angular  in  dr  The  result  is  n-q  equations  that  are  linear  in  the  n—p  variables  dv 

T 

Fy  -  Jkdx  -f  J2d2  +  T  ,4.  ;  s2  rf2!2  =  0  (“-7al 

where  7,  is  upper  triangular,  plus  the  system  of  q  quadratic  equations  in  the  p  unknowns  d„ 

T 

F2  J3d2  -  T  A2  1  s2  d2\2  -  0. 


(2.7b) 
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Here  g>p,  with  q=p  as  long  as  J  is  nonsingular,  or  J  is  singular  but  J  augmented  by  the  p 
T 

rows  has  full  column  rank.  In  practice  this  means  that  q  generally  equals  p  unless 

f 

rank(F  (xj)  <  n— p.  The  root  or  minimizer  of  MT  is  then  found  from  (2.7)  by  calculating  the 
which  is  the  root  of  minimizer  of  the  quadratic  system  of  equations  (2.7b),  substituting  this 
d2  into  (2.7a)  and  calculating  dt  by  solving  a  triangular  system  of  linear  equations,  and  multi¬ 
plying  d  by  Q  to  obtain  d. 

The  cost  of  solving  the  tensor  model  by  this  process  is  the  standard  2/3  n  cost  of  a  QR 
factorization,  plus  an  additional  n  p  <  n  cost  for  the  orthogonal  transformation  of  the  vari¬ 
able  space,  plus  the  cost  of  solving  the  pxp  system  of  quadratics.  The  latter  is  limited  to  0[p) 

•  .  .  3  4 

iterations  which  each  cost  p  /6  multiplication  and  additions,  so  it  is  an  insignificant  0(p  )  < 

0(n  )  cost.  The  case  p=l  is  the  most  frequent  in  our  computational  experience,  and  in  this 
case  the  quadratic  equation  is  solved  or  minimized  analytically.  Thus  solving  the  derivative 
tensor  model  costs  essentially  the  same  as  finding  the  root  of  the  standard  linear  model  (1.3)  by 
the  QR  factorisation.  It  is  possible  to  adapt  the  tensor  solution  algorithm  to  use  the  PLU  fac¬ 
torization,  or  a  sparse  factorization,  instead. 

I 

On  singular  problems  with  rank(F  (z*))  >  n  —  p,  the  solution  of  the  tensor  model  by  the 
above  process  usually  will  be  well  posed.  The  convergence  analysis  for  singular  systems  of  non¬ 
linear  equations  shows  that  near  zm,  we  can  expect  the  past  steps  |st]  to  be  in  directions  near 
the  null  vectors  of  F  (xm).  In  this  case,  the  quadratic  term  of  the  tensor  model  supplies  infor¬ 
mation  in  the  directions  where  the  linear  model  is  lacking.  This  results  in  the  linear  system 
(2.7a)  being  well  conditioned,  and  moves  the  iil-conditioning  of  the  standard  linear  model  into 
the  linear  term  of  the  quadratic  equations  (2.7b),  which  still  are  well  posed  due  to  the  qua¬ 
dratic  term. 
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In  addition,  if  F  (xc)  happens  to  be  singular  or  ill-conditioned  at  any  iteration  on  any 
problem,  and  has  p  or  fewer  small  or  zero  singular  values,  then  the  solution  of  the  tensor  model 
usually  will  be  well  posed  for  similar  reasons. 

2.3  Computational  results  with  the  derivative  tensor  method 

A  computer  implementation  of  a  derivative  tensor  method  that  is  based  upon  the  ideas 
summarized  in  Sections  2.1  and  2.2  has  been  extensively  tested.  A  high  level  description  of  the 
method  we  have  implemented  is  given  in  Algorithm  2.1. 

Algorithm  2.1.  An  Iteration  of  the  Derivative  Tensor  Method:  given  ic,  F(zc) 

1.  Calculate  F  ( xc )  and  decide  whether  to  stop.  If  not  : 

2.  Select  the  past  points  to  use  in  the  tensor  model  from  among  the  V  n  m0st  recent  past 

points. 

3.  Calculate  the  second  order  term  of  the  tensor  model,  Te,  so  that  the  tensor  model  interpo¬ 

lates  F[x)  at  all  the  points  selected  in  step  2. 

4.  Find  the  root  of  the  tensor  model  ,  or  its  minimizer  (in  the  /2  norm)  if  it  has  no  real  root. 

5.  Select  z+  =  xc  —  \c  de,  where  dc  either  is  the  step  calculated  in  step  4  or  the  Newton  step, 

using  a  line  search  to  choose  \c. 

8.  Set  xe  «—  F{X')  «—  F(x+),  go  to  step  1. 

Details  of  our  implementation  are  given  in  Frank  |1984j  and  Schnabel  and  Frank  [1984]. 
Note  that  the  Newton  step  is  calc  dated  as  a  byproduct  of  the  tensor  model  solution,  and  occa¬ 
sionally  is  used  as  the  search  direction  in  the  tensor  method.  In  particular,  the  Newton  step  is 
used  in  step  5  if  Algorithm  2.1  finds  a  root  dT  of  the  tensor  model  that  isn't  a  descent  direction 
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for  ll/r(x)ll2  (a  very  rare  occurrence  in  practice  but  not  precluded  in  theory)  and  the  point 
zc  \-dT  is  unacceptable;  if  Algorithm  2.1  finds  a  miriimizer  of  the  tensor  model  at  which  the  l,, 
norm  of  the  tensor  model  hasn't  decreased  enough  from  xc ;  or  if  Algorithm  2.1  fails  to  find  a 
root  or  minimizer  of  the  tensor  model  in  8 p  iterations. 

We  compared  our  tensor  method  to  an  algorithm  that  is  identical  except  that  the  second 
order  term  Tc  always  is  zero.  That  is,  the  comparison  algorithm  is  a  finite  difference  Newton's 
method  with  a  line  search,  except  that  the  Newton  step  —  Jc  is  modified  to  the  approxi- 

mat  ion  to  the  pseudo-inverse  step  ~(JC  Jc  +  '/)  J  F(xc)  with  <  small  (see  Dennis  and  Schna¬ 
bel  ,1983))  when  Jc  —  F  (xc)  is  singular  or  sufficiently  ill-conditioned. 

The  Newton  and  tensor  methods  were  compared  on  sets  of  nonsingular  and  singular  test 
problems.  The  results  are  summarized  briefly  in  Tables  2.1  -  2.3.  The  nonsingular  test  prob¬ 
lems  are  a  standard  set  in  this  field,  given  in  More",  Garbow,  and  Hillstrom  [  1 98 1 1 ;  their  dimen¬ 
sions  range  from  n  =  2  to  30.  The  singular  problems  are  simple  modifications  of  these  prob¬ 
lems  constructed  to  have  the  same  solution  x  with  rank(F  (im))  =  n—  1  and  n  — 2,  respec¬ 
tively.  The  procedure  for  generating  these  singular  problems  is  described  in  Schnabel  and 
Frank  1984|. 

A  significant  feature  of  the  test  results  is  that  the  tensor  method  is  virtually  never  less 
efficient  than  the  standard  method,  and  is  almost  always  more  efficient.  In  fact,  on  problems 
requiring  ten  or  more  iterations  of  the  standard  method,  the  tensor  method  always  is  more 
efficient.  The  gains  in  efficiency  on  the  nonsingular  problems  are  an  average  of  about  IH^e  if 
all  test  problems,  including  some  very  easy  problems  where  no  gains  are  likely,  are  considered, 
and  an  average  of  about  32%  improvement  on  the  harder  problems.  The  gains  in  efficiency  on 
the  nonsingular  problems  are  an  average  of  about  40%  and  30cr  in  the  rank  rr  l  and  n  2 
cases,  respectively,  and  an  average  of  about  37%  and  46%  improvement ,  respectively,  on  the 


Table  2.1  -  Summary  for  Problems  with  F  ( x  )  Nonsingular 

Problem  Number  of  Average  Ratio,  Tensor  Standard  Tie 

Set  Problems  Tensor  Method  /  Standard  Method  Better  Better 

terations  Jacobian  Function 
evaluations  evaluations 

All  problems  25  0.811  0.813  0.828  18  1  6 

Harder  problems  only  *  11  0.662  0.668  0.691  11  0  0 

Additional  problems  solved  by  standard  method  only  :  2 

by  tensor  method  only  :  1 

Table  2.2  --  Summary  for  Singular  Test  Set  with  Rank  (F  (x*))  =  n—  l 

All  Problems  17  j  0.576  I  0.609  I  0.603  15  0  2 

Harder  Problems  Only  *  9  I  0.392  ,  0.129  0.131  9  0  0 


Table  2.3  -•  Summary  for  Singular  Test  Set  with  Rank  (F  (x*))  =  n— 2 


All  Problems 

13 

0.631 

0.864 

0.729 

11 

2 

Harder  Problems  Only  * 

7 

0.499 

0.535 

0.542 

7 

0 

Additional  problems  solved  by  standard  method  only  :  1 

by  tensor  method  only  :  5 


*  Problems  where  slower  method  required  at  least  10  iterations 


harder  problems.  In  addition,  the  tensor  method  solved  significantly  more  of  the  problems  with 
rankfF  (x,)|  -  n—  2  than  the  standard  method;  this  is  not  reflected  in  Table  2.3  which  reflects 
only  problems  solved  by  both  methods. 
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The  improvements  by  the  tensor  method  on  the  problems  with  rank(F  (xm))  =  n— 1  are 
partially  explained  by  the  faster  local  convergence  of  the  tensor  method  as  discussed  in  Section 
2.4.  in  fact,  our  stopping  tolerances  were  relatively  loose;  at  tighter  stopping  tolerances  the 
improvements  by  the  tensor  method  on  singular  problems  are  greater.  On  nonsingular  prob¬ 
lems,  the  improvements  by  the  tensor  method  apparently  come  from  using  a  model  that  better 
interpolates  F(r);  to  our  knowledge,  the  local  convergence  rate  is  no  better  than  for  Newton’s 
method. 

These  computational  results  indicate  that  the  derivative  tensor  method  is  consistently  as 
reliable  as  currently  used  methods  for  solving  systems  of  nonlinear  equations.  In  addition,  on 
problems  where  function  evaluation  is  the  dominant  cost,  it  is  consistently  as  efficient  and 

i 

often  considerably  more  efficient,  especially  on  problems  with  a  small  rank  deficiency  in  F  (*„)• 
The  additional  cost  on  the  tensor  method  in  arithmetic  ipernfions  and  computer  storage  is 
small.  Indeed,  in  our  tests,  even  on  problems  with  n  =  30,  the  number  of  past  points  interpo¬ 
lated  by  the  tensor  model  generally  was  I  or  2,  so  that  the  additional  arithmetic  and  storage 
costs  are  very  small.  For  these  reasons,  we  believe  that  the  derivative  tensor  method  should  be 
considered  as  a  promising  alternative  to  standard  methods  for  general  purpose  software  for 
solving  systems  of  nonlinear  equations. 


2.4  Convergence  analysis  for  the  derivative  tensor  method 

Frank  [1984j  has  extensively  analyzed  the  local  convergence  of  the  derivative  tensor 
method.  The  most  important  result  is  that  when  rank [F  (*„))  =  n— 1,  the  method  described  in 
the  previous  sections  is  shown  to  be  locally  3-step  convergent  with  7-order  7/6,  meaning  that  if 
Hx0  tm II  is  sufficiently  small,  the  sequence  of  iterates  i  zk  \  converges  to  and,  for  some  c  0, 
obeys 


16 


,lxk*3  -**"  <  C  l'Xk  ~  XJ'i 

for  all  k  >  0.  This  rate  of  convergence  is  significantly  faster  than  Newton’s  method  which  is 
linearly  convergent  with  constant  approaching  1/2  under  the  same  assumptions.  For  simplicity 
of  analysis,  Frank's  result  is  proven  for  a  method  that  interpolates  only  the  most  recent  past 
iterate  (p—  1);  however  it  is  not  expected  that  interpolating  additional  past  points  would  hurt 
its  performance. 

The  reasoning  behind  the  three  step  convergence  result  is  interesting  because  it  helps 
explain  how  the  tensor  method  works  on  singular  problems.  From  an  arbitrary  starting  point 
close  to  xn,  it  is  shown  that  the  first  step  provides  at  least  linear  convergence  and  results  in  an 
iterate  whose  error  (its  difference  with  ij  is  nearly  in  the  direction  of  the  null  vector  of 
F  (*„)•  The  next  step  also  provides  at  least  linear  convergence  and  also  results  in  an  error 
nearly  in  the  null  vector  direction.  Thus  after  two  steps,  the  current  iterate  and  the  previous 
iterate  are  both  close  to  being  along  the  null  vector  direction  from  ix,  so  that  the  step  (the 
difference  of  these  iterates)  used  in  constructing  the  tensor  term  is  essentially  in  this  direction. 
Thus  the  quadratic  term  of  the  tensor  model  provides  information  in  precisely  the  direction 
where  the  linear  model  is  lacking.  This  causes  the  third  step  to  be  a  fast  one,  in  fact  giving  an 
order  1.5  improvement  which  would  lead  to  three  step  y-order  1.5.  (The  smaller  7/6  rate 
comes  from  allowing  for  the  possibility  that  the  first  or  second  steps  are,  by  luck,  too  good.) 
After  the  third  step,  the  error  of  the  new  iterate  is  not  guaranteed  to  be  close  to  the  null  space 
of  F  (*„),  so  the  three  step  process  repeats.  In  practice,  however,  the  errors  appear  to  remain 
close  to  the  null  space  so  that  one  step,  at  least  y-superlinear,  convergence  is  observed. 

When  the  rank  of  F  (im)  is  less  than  n  —  1.  the  derivative  tensor  method  probably  is  not 
faster  than  linearly  convergent  in  theory  because  the  mode!  described  in  Section  3.1  does  not 
approximate  enough  of  F  (r).  However  the  test  results  of  the  previous  section  indicate  that 
fast  convergence  still  is  obtained  in  the  case  rank(F  (*„))  —  n  —2.  It  would  be  possible  to 
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approximate  the  necessary  portions  of  F  (z)  using  previous  values  of  the  Jacobian  rather  than 
the  function;  this  has  not  be  pursued. 

i 

When  F  (z*)  is  nonsingular  Frank  [1984]  shows  that  the  tensor  method  retains  the  q- 
quadratic  convergence  of  Newton’s  method.  This  simply  means  that  close  to  the  solution,  the 
quadratic  term  of  the  tensor  model  has  a  small  effect  and  does  not  hurt  the  convergence. 
When  r»=l  it  can  be  shown  that  the  derivative  tensor  method  has  g-order  2.41,  but  the  tensor 
method  doesn’t  interpolate  enough  information  for  this  result  to  extend  to  multi-dimensional 
problems. 


3.  Secant  Tensor  Methods 

When  analytic  Jacobians  are  unavailable  and  function  evaluation  is  sufficiently  expensive, 
it  is  not  cost  effective  to  calculate  a  finite  difference  Jacobian  approximation  at  each  iteration 
of  a  method  for  solving  systems  of  nonlinear  equations.  Instead,  secant  methods  are  used  that 
only  occasionally  used  finite  difference  Jacobians  and  otherwise  base  the  method  strictly  upon 
the  function  values  at  the  iterates. 

The  standard  secant  method  for  nonlinear  equations,  Broyden’s  method,  uses  a  linear 
model  of  F(z)  around  zc  that  interpolates  only  F(xc)  and  /r(z_]),  where  z is  the  previous 
iterate.  Thus  there  are  additional  previous  function  values,  namely  F(z_2),  F(x_3),  ....  that  a 
method  still  could  interpolate.  In  the  derivative  tensor  method  discussed  in  Section  2,  the 
interpolation  of  these  function  values  was  the  basis  for  the  tensor  term  T  .  In  the  secant  case, 
however,  since  the  first  derivative  matrix  is  not  known,  the  value  of  F{x)  at  a  previous  iterate 
x  k  only  is  sufficient  to  determine  a  linear  model  in  the  direction  zc  —  x  k.  Roughly  speaking, 
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only  if  there  are  two  previous  iterates  in  the  same  direction  from  xe  is  there  sufficient  informa¬ 
tion  to  determine  a  quadratic  model.  This  indicates  that  it  will  be  more  difficult  to  form  a 
quadratic  model  in  the  secant  method  case.  Recall,  however,  that  for  singular  problems  the 
iterates  often  converge  nearly  along  a  single  direction  so  that  a  quadratic  model  still  may  be 
possible. 

Thus  before  considering  how  one  might  base  a  tensor  secant  model  upon  the  interpolation 
of  multiple  function  values,  it  is  relevant  to  discuss  how  a  linear  secant  model  can  interpolate 
multiple  function  values.  We  first  summarize  this  briefly,  and  then  discuss  when  and  how  we 
form  a  secant  tensor  model,  and  our  computational  results  with  the  secant  tensor  method. 


3.1  Linear  Models  with  Multiple  Secant  Equations 

Suppose  xc  is  the  current  iterate  and  r_1,  ...,  x  are  a  set  of  p  not  necessarily  consecu¬ 
tive  past  iterates  chosen  as  in  Section  2.1.  That  is,  x_l  is  the  most  recent  past  iterate,  and  the 
set  of  directions  | ak  |  =  !z_*  ~  xe\  are  linearly  independent.  Then  it  is  possible  to  choose  the 
Jacobian  approximation  Jc^RnXn  so  that  the  secant  model  (1.2)  interpolates  F(x_t), 
k=l,  •  •  •  ,p.  This  requires 

F(x_k)  =  F(xc)  +  Jcsk  ,  fc=l,  ■  •  •  ,p  .  (3.1) 

If  S,  Y£RnXf  are  defined  by  column  k  of  S  =  sk,  column  k  of  Y  =  F(x_k)  ~  F(x(),  then 
Schnabel  ‘1983)  shows  that  the  closest  matrix  J.  to  the  previous  Jacobian  J_x  that  causes  (3.1) 
to  be  satisfied  is 

Jc  =  +(Y-J^S)(ST Sf'  ST  .  (3.2) 

Broyden's  method  simply  is  the  special  case  of  (3.2)  with  p=l.  The  update  (3.1)  appears  to  be 
a  rank  p  change  to  J  ,,  but  if  the  linear  model  at  the  previous  update  interpolated  all  the 
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same  previous  function  values  except  F(xe),  then  (3.2)  is  a  rank  one  update. 

Multiple  secant  updates  along  these  lines  have  been  proposed  by  Barnes  [1965},  Gay  and 
Schnabel  [1978j,  and  Schnabel  (1983j.  Frank  [1984]  implemented  a  version  where  the  past 
iterates  to  interpolate  are  chosen  by  the  fairly  restrictive  criteria  described  in  Section  2.1, 
essentially  very  strong  linear  independence  plus  only  ^  n  iterates  considered.  Schnabel  [1983] 
showed  that  this  method  is  g-superlinearly  convergent  under  the  same  conditions  as  Broyden’s 
method.  Frank  found  that  on  the  nonsingular  problems  from  the  More',  Garbow,  and  Hillstrom 
[1981]  test  set  the  multiple  secant  method  was  better  on  10  problems,  worse  on  2,  and  about 
the  same  on  21,  with  an  average  improvement  of  10%.  On  the  singular  problems  described  in 
Section  3.3,  the  multiple  secant  method  was  only  marginally  better  than  Broyden’s  method. 

These  results  indicate  that  a  properly  implemented  multiple  secant  method,  using  a 
linear  model,  is  consistently  at  least  as  efficient  as  Broyden’s  method.  Therefore,  our  secant 
tensor  method,  which  also  is  based  on  interpolating  multiple  function  values,  builds  upon  this 
linear  multiple  secant  model. 


3.2  Forming  the  Secant  Tensor  Model 

To  determine  a  second  order  model  of  F{x)  using  function  values  only,  it  is  necessary  to 
have  more  than  p+1  function  values  that  are  nearly  in  some  p  dimensional  subspace  of  the 
variable  space.  To  illustrate  the  approach  taken  in  forming  our  secant  tensor  model,  suppose 
that  there  are  two  past  iterates,  and  x _v  and  that  the  steps  to  them  from  if,  s,  and  s2, 
are  nearly  linearly  dependent.  That  is,  s2  -  <sl+z  where  z  s1~0  and  lUII/IISjll  is  small. 
The  tensor  secant  model  (1.5)  interpolates  F(x)  at  x_t  and  x  2  if 

F( x_k )  =  F(xc)  +  J.9k  +  ^Trsksk 


k  =  1,  2  . 


(3.3) 
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In  the  situation  where  s j  and  a2  are  nearly  collinear,  we  interpret  (3.3)  as  giving  two 
pieces  of  information  in  the  direction  Sj  and  no  new  information  in  the  direction  z.  Thus  we 
can  make  the  a  priori  assumptions  that 

Jez  =  J_Yz  ,  Tc  sY  z  =  Tc  z  z  =  0  ,  (3.4) 

for  we  know  that  our  minimum  norm  methods  for  choosing  Jc  and  T  will  cause  them  to 

satisfy  (3.4)  if  the  secant  equations  are  in  the  direction  only.  Combining  (3.3)  and  (3.4)  and 
using  «2  =  <>Sj+*  gives 

F(x-i)  =  F(ze)  +  Jcsi  +  '‘*Tcsisi  (3-5a) 

F(x_2)  =  +  M^csi  +  J~\z  +  (  <,2/2)  Tcslsl  .  (3.5b) 

Equations  (3.5)  are  two  linear  equations  in  the  two  unknown  vectors  Jcsl  and  T(.slsv  which 
are  easily  solved  to  yield 

2  2 

Jcb j  =  y  =  (n  u  —  v)  /  (o  —  o)  (3.6a) 

Tcs\3\  ~  t  =  (nn  -»)/(«-  '>*)  (3.6b) 

where  u  =  F{ i_t)  -  F(xc),  v  =  F(x_2)  -  F(xc)-y_jX. 

Equations  (3.6)  are  the  secant  equations  for  Jc  and  Tc,  respectively.  Given  these  condi¬ 
tions,  we  form  Je  as  in  the  linear  model  multiple  secant  method  and  Tc  as  in  the  derivative 

tensor  method.  That  is,  Je  is  given  by  (3.2)  with  Y  =  y  and  5  =  while  Tc  =  a  Sj  st  with 
T  2 

a  -  </(s,  a,)  .  Thus  the  tensor  model  becomes 

M{xc  -rd)  =  F(xc)  4-  Jcd  -t-  l/2  a  (wT d)2  (3.7) 

with  w  =  a,. 

The  remaining  issue  is  the  criterion  for  choosing  s2  to  be  "nearly  linearly  dependent"  on 
a,.  The  residual  z  cannot  be  allowed  to  be  too  large,  or  the  inaccuracy  in  ./  z  may  cause  the 
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tensor  T.  to  be  entirely  inaccurate.  Frank  j  1984  shows  that  it  is  necessary  that  hr  II  __ 
2 

0(ll«2ll  )  for  the  resultant  T  to  be  reliable.  Furthermore,  he  indicates  that  one  can  expect 
consecutive  iterates  to  satisfy  this  condition  near  the  solution  of  a  singular  problem.  There  is 
no  reason,  however,  to  expect  this  condition  to  be  satisfied  for  nonsingular  problems.  Thus  it  is 
likely  that  the  quadratic  term  in  the  secant  tensor  method  will  be  used  near  the  roots  of  singu¬ 
lar  problems  only. 

Frank  [1984}  has  generalized  the  above  procedures  to  use  more  past  iterates.  First  he 
chooses  p  strongly  linearly  independent  directions  to  past  iterates  by  the  same  process  used  in 
the  derivative  secant  method  and  the  linear  model  multiple  secant  method.  This  set  of  direc¬ 
tions  is  the  generalization  of  the  direction  Sj  in  the  above  example.  Then  he  chooses  q  direc¬ 
tions  to  additional  past  iterates  that  are  nearly  dependent  on  the  subspace  spanned  by  the  first 
set,  in  the  sense  described  above.  This  set  is  the  alizutloti  of  the  iirection  ■> in  the  above 

example.  In  our  computational  tests  the  second  set  contained  0  or  1  directions  over  99%  of  the 
time,  so  it  suffices  to  consider  this  case.  The  result  is  a  set  of  equations  similar  to  (3.5),  which 
are  easily  reduced  to  the  conditions  JCS  =  Y,  Tcww  —  t,  for  S,  Y(zR"xp  and  some  w  in  the 
span  of  the  columns  of  5.  Je  then  is  chosen  by  (3.2)  while  T.  =  aww  with  a  =  t /(w  w)  . 
Thus  the  secant  tensor  model  again  is  given  by  (3.7). 


3.3  Solving  the  Secant  Tensor  Model 

Algebraically,  the  secant  tensor  model  (3.7)  is  just  the  special  case  of  the  derivative  ten¬ 
sor  model  (2.5)  with  p=l.  Thus  its  root  or  minimizer  is  found  by  the  same  procedures 
described  in  Section  2.2.  Since  the  tensor  term  has  rank  one,  the  solution  process  results  in 
finding  the  root  or  minimizer  of  one  quadratic  equation  in  one  unknown  followed  by  the  solu¬ 
tion  of  a  system  of  n  — 1  linear  equations  and  n  -  1  unknowns.  Therefore  no  iterative  procedure 


21 


is  required  and  the  cost  is  essentially  the  same  as  for  finding  the  root  of  a  linear  model. 

2 

It  is  possible  to  perform  each  iteration  of  Broyden’s  method  in  0(n  )  operations  by 

sequencing  a  QR  factorization  of  J  as  proposed  originally  by  Gill  and  Murray  1972;.  These 

efficiencies  can  be  extended  to  the  secant  tensor  method  so  that  it  does  not  cost  appreciated 
2 

more  than  the  0(n”)  implementation  of  Broyden’s  method.  This  was  not  done  in  our  implemen¬ 
tation. 

In  the  case  where  the  quadratic  term  of  the  secant  tensor  model  has  rank  greater  than 
one  (which  virtually  never  occurred  in  practice),  Frank  1984!  solves  the  secant  tensor  model  by 
a  minor  generalization  of  the  techniques  of  Section  2.2. 

3.4  Computational  Results  with  the  Secant  Tensor  Method 

A  secant  tensor  method  has  been  implemented  and  tested  on  the  same  nonsingular  and 
singular  problems  that  were  used  for  the  derivative  tensor  method  tests  described  in  Section 
2.3.  The  basic  iteration  is  summarized  in  Algorithm  3.1  below. 

Algorithm  3.1.  An  Iteration  of  the  Secant  Tensor  Method:  given  x  ,  F(xc) 

1.  Decide  whether  to  stop.  If  not: 

2.  Select  the  two  sets  of  past  points  to  use  in  the  tensor  model  from  among  the  n  most 

recent  past  points. 

3.  Calculate  the  first  and  second  order  terms  of  the  tensor  model,  J  and  7\,  so  that  the  ten- 

sor  model  interpolates  F(z)  at  all  the  points  selected  in  step  2. 

4.  Find  the  root  of  the  tensor  model,  or  its  minimizer  (in  the  F  norm)  if  it  has  no  real  root. 

5.  Select  r,  by  a  trust  region  method  that  chooses  to  be  a  linear  combination  of  the 

steepest  descent  direction,  and  the  step  calculated  in  step  4  or  the  root  of  the  linear  part 
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of  the  model, 

6.  Set  zc  —  F(xe)*-F(x+),  go  to  step  1. 

In  addition  to  the  strategy  shown  in  Algorithm  3.1,  the  Jacobian  is  calculated  by  finite 
differences  at  the  initial  iteration  and  whenever  the  secant  algorithm  calculates  two  unsuccess¬ 
ful  trial  steps  in  a  row.  This  later  practice  is  taken  from  More’s  MINPACK  algorithm  (More, 
Garbow,  and  Hillstrom  I980i),  as  is  the  use  of  a  trust  region  strategy  at  step  5. 

The  secant  tensor  method  described  above  was  compared  to  a  Broyden’s  method  version 
and  to  a  linear  model  multiple  secant  method  version  of  the  same  code.  These  were  derived  by 
setting  T.  ~  0  in  the  secant  model,  and  by  allowing  one  or  multiple  secant  equations  for  the 
Jacobian  approximation,  respectively  (i.e.,  p  =  l  or  p  >1  in  (3.1)  and  (3.2)).  The  remainder  of 
the  code  was  unchanged. 

On  the  nonsir.gular  test  problems  from  More',  Garbow,  and  Hillstrom  [1981],  the  secant 
tensor  method  was  better  than  Broyden’s  method  on  9  problems,  worse  on  5,  and  about  the 
same  on  18,  with  an  average  improvement  in  function  evaluations  of  9%.  These  results  are 
marginally  worse  than  the  results  for  the  linear  model  multiple  secant  method  given  in  Section 
3.1.  Thus  adding  multiple  interpolation  conditions  to  the  linear  model  seems  to  help  a  bit  on 
nonsingular  problems,  but  the  tensor  term  seems  to  give  no  additional  help. 

On  the  test  problems  with  rank  F  (i^)  ■=  n-l,  the  secant  tensor  method  was  better  then 
Broyden’s  method  on  18  problems,  worse  on  1,  and  tied  on  4,  with  an  average  improvement  of 
25%.  On  the  test  problems  with  rank  F  (*„)  —  n—?.  the  secant  tensor  method  was  better 
than  Brovden  s  method  on  18  problems,  worse  on  1,  and  tied  on  1,  with  an  average  improve¬ 
ment  of  33%.  In  both  of  these  cases,  the  linear  model  multiple  secant  method  was  not  appreci¬ 
ably  better  than  Broyden  s  method.  Thus  the  addition  of  a  tensor  term  seems  to  help  consider¬ 
ably  on  problems  with  a  low  rank  singularity  at  the  solution. 


In  over  99°o  of  the  iterations  on  each  test  set,  the  rank  of  the  tensor  term  was  0  or  1. 
Implementing  a  secant  tensor  method  with  a  rank  one  tensor  requires  little  additional  storage 
and  very  few  additional  arithmetic  operations  in  comparison  to  Broyden’s  method.  Thus  it 
appears  that  the  gains  mentioned  above  can  be  obtained  at  little  additional  cost  to  the  com¬ 
puter,  and  a  reasonably  small  increase  in  the  complexity  of  the  code.  Therefore,  such  a  secant 
tensor  code  might  be  a  useful  general  purpose  alternative  to  a  Broyden’s  method  code  in  a  set¬ 
ting  where  singular  or  ill-conditioned  problems  are  solved  regularly. 


4.  Extensions  of  Tensor  Methods  to  Optimization  Problems 

The  nonlinear  least  squares  problem 

min  IIF(*)II2  ,  F  :  Rm  —*  R"  (41) 

reR" 

can  be  viewed  as  an  overdetermined  version  of  the  nonlinear  equations  problem  (1.1).  For  non¬ 
linear  least  squares,  the  Jacobian  matrix  always  is  computed  analytically  or  approximated  by 
finite  differences.  Thus  it  is  natural  to  consider  extending  the  derivative  tensor  method  sum¬ 
marized  in  Section  2  to  solving  (4.1). 

The  derivative  tensor  method  extends  to  overdetermined  systems  of  equations  with  very 
little  change.  The  formation  of  the  tensor  model  is  unchanged  except  that  the  model  has  rn 
quadratic  components  rather  than  n.  The  solution  procedure  outlined  in  Section  2.2  now 
reduces  the  quadratic  equations  to  the  solution,  in  a  least  squares  sense,  of  m  n  *-p  quadratic 
equations  in  p  unknowns,  followed  by  the  solution  of  n—  p  linear  equations  in  n  p  unknowns. 
(If  p  -  0  this  is  just  the  QR  algorithm  for  solving  the  linear  model  in  the  least  squares  sense.) 
The  cost  in  arithmetic  operations  and  computer  storage  again  is  hardly  more  than  for  a 
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standard  method  for  nonlinear  least  squares. 

A  tensor  method  for  nonlinear  least  squares  along  these  lines  currently  is  being  imple¬ 
mented  at  the  University  of  Colorado.  This  approach  is  most  closely  related  to  other  nonlinear 
equations  based  approaches  to  nonlinear  least  squares.  Of  these  the  most  computationally 
efficient  appears  to  be  More’s  trust  region  Levenberg  Marquardt  algorithm  (More'  )  1 9 7 8 ) ) , 
implemented  in  MINPACK  (More,  Garbow,  and  Hillstrom  [1980]),  and  it  will  be  interesting  to 
see  how  the  tensor  method  compares  to  this  on  full  rank  and  rank  deficient  problems.  An 
alternate  approach  to  nonlinear  least  squares,  related  more  closely  to  viewing  the  problem  as  a 
special  case  of  unconstrained  optimization,  is  embodied  in  the  NL2SOL  algorithm  of  Dennis, 
Gay  and  Welsh  1981],  We  also  intend  to  compare  the  tensor  method  for  nonlinear  least 
squares  to  this  approach. 

The  general  unconstrained  optimization  problem  is 

min  f  (x)  :  Rn  —+  R  .  H2\ 

letf* 

The  necessary  condition  for  a  solution  of  (4.2),  V/(r)  ~  0,  is  a  system  of  n  nonlinear  equations 
in  n  unknowns.  The  standard  local  method  for  (4.2),  also  called  Newton’s  method,  simply  is 
derived  by  applying  (1.4)  to  this  system  of  equations. 

Thus  it  is  tempting  to  expect  that  the  tensor  method  for  nonlinear  equations  can  be 
applied  to  unconstrained  optimization  simply  by  applying  the  methods  of  Sections  2  and  3  to 
the  system  of  equations  T’/(x)  —  0.  This  approach  has  several  major  flaws.  One  is  that  all  the 
derivatives  of  the  unconstrained  optimization  problem  are  symmetric,  -o  that  their  approxima¬ 
tion  should  be  too,  but  the  tensors  Tc  derived  in  Sections  2  and  3  are  not  3-way  symmetric. 
More  importantly,  if  an  unconstrained  optimization  problem  has  a  singular  Hessian  matrix  at 
a  local  minimizer,  then  the  projection  of  the  third  derivative  tensor  in  the  null  space 

direction  v  of  the  Hessian  (i.e.  V  f(xm )  v  v  v )  must  also  be  0.  Thus  approximating  the  third 
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derivative  for  unconstrained  optimization,  the  analog  of  approximating  the  second  derivative 
as  is  done  in  our  tensor  methods  for  nonlinear  equations,  would  not  be  expected  to  help  solve 
singular  unconstrained  optimization  problems.  It  appears  that  approximations  to  both  the 
third  and  fourth  derivative  matrices  will  be  required  to  help  solve  singular  optimization  prob¬ 
lems. 

An  approach  to  tensor  methods  for  unconstrained  optimization  that  makes  small  rank 

approximations  to  the  third  and  fourth  derivatives  currently  is  underway  at  the  University  of 

Colorado.  The  effect  of  approximating  both  third  and  fourth  derivative  matrices  is  that,  at 

each  iteration,  one  must  solve  a  small  system  of  cubic  equations  in  addition  to  the  remaining 

linear  equations.  The  storage  and  arithmetic  overhead  remains  reasonable.  Very  preliminary 

computational  results  using  this  approach  to  solve  singular  optimization  problems  are 
encouraging. 
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